Non-audible murmur recognition based on fusion of audio and visual streams
نویسندگان
چکیده
Non-Audible Murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker’s ear. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. Consequently, higher frequency components are attenuated in a NAM signal. Owing to such factors as spectral reduction, the unvoiced nature of NAM, and the type of articulation, the NAM sounds become similar, thereby causing a larger number of confusions in comparison to normal speech. In the present article, the visual information extracted from the talker’s facial movements is fused with NAM speech using three fusion methods, and phoneme classification experiments are conducted. The experimental results reveal a significant improvement when both fused NAM speech and facial information are used.
منابع مشابه
Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition
In this paper, we report non-audible murmur (NAM) recognition results in noisy environments and investigate the effect of the Lombard reflex on non-audible murmur recognition. Non-Audible murmur is speech uttered very quietly and captured through body tissue by a special acoustic sensor (e.g., NAMmicrophone). A system based on non-audible murmur recognition can be applied in cases when privacy ...
متن کاملApplications of Nammicrophone for Privacy in Human-machi
In this paper, we present the use of stethoscope and silicon NAM microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker’s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when ...
متن کاملTwo-Level Bimodal Association for Audio-Visual Speech Recognition
This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second...
متن کاملMulti-level Fusion of Audio and Visual Features for Speaker Identification
This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchroni...
متن کاملJoint processing of audio and visual information for multimedia indexing and human-computer interaction
Information fusion in the context of combining multiple streams of data e.g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Speci cally, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e.g., speech recognition/transcription,...
متن کامل